43 research outputs found

    Forecasting ozone threshold exceedances in urban background areas using supervised classification and easy-access information

    Get PDF
    Classification models to forecast exceedance of the ozone (O3) threshold established by European legislation are rare in literature, as is the focus on background O3, with higher concentrations at city outskirts. This study evaluated the performance of nine classifiers to forecast this threshold exceedance by background O3. Models used five large hourly background O3 data sets (2006–2015), and included temporal features describing the O3 formation dynamic. Bagging and stacking ensembles of such classifiers and their cost of learning were also evaluated. C5.0 and nnet classifiers achieved the best forecasting performance, even at imbalanced learning. Bagging ensembles outperformed stacking approaches, although with little accuracy improvement as compared to classifiers. The cost of learning evidenced similar performance results from reduced fractions of original data sets. The use of these models to forecast background O3 threshold exceedances are encouraged due to the performances obtained and to their easy reproducibilit

    Estimation of Particulate Matter Contributions from Desert Outbreaks in Mediterranean Countries (2015–2018) Using the Time Series Clustering Method

    Get PDF
    North African dust intrusions can contribute to exceedances of the European PM10 and PM2.5 limit values and World Health Organisation standards, diminishing air quality, and increased mortality and morbidity at higher concentrations. In this study, the contribution of North African dust in Mediterranean countries was estimated using the time series clustering method. This method combines the non-parametric approach of Hidden Markov Models for studying time series, and the definition of different air pollution profiles (regimes of concentration). Using this approach, PM10 and PM2.5 time series obtained at background monitoring stations from seven countries were analysed from 2015 to 2018. The average characteristic contributions to PM10 were estimated as 11.6 ± 10.3 µg·m−3 (Bosnia and Herzegovina), 8.8 ± 7.5 µg·m−3 (Spain), 7.0 ± 6.2 µg·m−3 (France), 8.1 ± 5.9 µg·m−3 (Croatia), 7.5 ± 5.5 µg·m−3 (Italy), 8.1 ± 7.0 µg·m−3 (Portugal), and 17.0 ± 9.8 µg·m−3 (Turkey). For PM2.5, estimated contributions were 4.1 ± 3.5 µg·m−3 (Spain), 6.0 ± 4.8 µg·m−3 (France), 9.1 ± 6.4 µg·m−3 (Croatia), 5.2 ± 3.8 µg·m−3 (Italy), 6.0 ± 4.4 µg·m−3 (Portugal), and 9.0 ± 5.6 µg·m−3 (Turkey). The observed PM2.5/PM10 ratios were between 0.36 and 0.69, and their seasonal variation was characterised, presenting higher values in colder months. Principal component analysis enabled the association of background sites based on their estimated PM10 and PM2.5 pollution profiles

    Estimation of Particulate Matter Contributions from Desert Outbreaks in Mediterranean Countries (2015-2018) Using the Time Series Clustering Method

    Get PDF
    North African dust intrusions can contribute to exceedances of the European PM10 and PM2.5 limit values and World Health Organisation standards, diminishing air quality, and increased mortality and morbidity at higher concentrations. In this study, the contribution of North African dust in Mediterranean countries was estimated using the time series clustering method. This method combines the non-parametric approach of Hidden Markov Models for studying time series, and the definition of different air pollution profiles (regimes of concentration). Using this approach, PM10 and PM2.5 time series obtained at background monitoring stations from seven countries were analysed from 2015 to 2018. The average characteristic contributions to PM10 were estimated as 11.6 +/- 10.3 mu g.m(-3) (Bosnia and Herzegovina), 8.8 +/- 7.5 mu g.m(-3) (Spain), 7.0 +/- 6.2 mu g.m(-3) (France), 8.1 +/- 5.9 mu g.m(-3) (Croatia), 7.5 +/- 5.5 mu g.m(-3) (Italy), 8.1 +/- 7.0 mu g.m(-3) (Portugal), and 17.0 +/- 9.8 mu g.m(-3) (Turkey). For PM2.5, estimated contributions were 4.1 +/- 3.5 mu g.m(-3) (Spain), 6.0 +/- 4.8 mu g.m(-3) (France), 9.1 +/- 6.4 mu g.m(-3) (Croatia), 5.2 +/- 3.8 mu g.m(-3) (Italy), 6.0 +/- 4.4 mu g.m(-3) (Portugal), and 9.0 +/- 5.6 mu g.m(-3) (Turkey). The observed PM2.5/PM10 ratios were between 0.36 and 0.69, and their seasonal variation was characterised, presenting higher values in colder months. Principal component analysis enabled the association of background sites based on their estimated PM10 and PM2.5 pollution profiles

    Modelos de mixturas finitas para la caracterización y mejora de las redes de monitorización de la calidad del aire

    Get PDF
    Antecedentes Los planes de monitorización de la calidad del aire, en ocasiones, no son convenientemente actualizados en concordancia con las cambiantes condiciones locales, repercutiendo en la información atmosférica que proporcionan, bien dejando de detectar nuevas fuentes de contaminación o duplicando cierta información. Además, posibles mantenimientos deficientes del equipamiento de las redes de monitorización suponen a aquel un inconveniente añadido. Para abodar estos aspectos, se ha recurrido a una combinación de métodos estadísticos para la optimización de los recursos empleados en la monitorización, introduciendo nuevos criterios para su mejora. Métodos Datos de monitorización de contaminantes clave como el monóxido de carbono (CO), dióxido de nitrógeno (NO2), ozono (O3), material particulado (PM10) y dióxido de azufre (SO2) fueron obtenidos de 12 estaciones de monitorización de la calidad del aire en Sevilla (España). Un total de 49 conjuntos de datos fueron modelizados mediante mixturas finitas gaussianas utilizando el algoritmo de esperanza-maximización (EM). Para resumir estos 49 modelos, la media (μm) y coeficiente de variación (cvm) de cada mixtura fueron calculados, y a partir de ellos, se realizó un análisis clúster jerárquico (ACJ) para estudiar el agrupamiento de las estaciones de acuerdo con estos estadísticos. El valor de los parámetros no monitorizados en las estaciones de medición fueron imputados aplicando un algoritmo basado en bosques aleatorios, utilizando los valores de μm y cvm conocidos. Posteriormente, el análisis de componentes principales (ACP) permitió comprender la relación intrínseca entre las estaciones de la red, así como la concordancia en su clasificación. Todas las técnicas fueron aplicadas utilizando el software estadístico gratuito y de código abierto R. Resultados y conclusiones Se ha analizado un ejemplo de atribución y contribución de fuentes utilizando la modelización mediante mixturas finitas, y el potencial de estos modelos es propuesto para caracterizar tendencias de contaminación. Los estadísticos de la mixturas μm y cvm representan su huella dactilar, y su empleo es nuevo en la caracterización de los modelos mixtos en el área de la gestión de la calidad del aire. La técnica de imputación empleada ha permitido la estimación de valores de concentración de parámetros no monitorizados y el planteamiento de nuevos esquemas de monitorización para esta red. El empleo posterior del ACP ha confirmado una clasificación errónea de una estación detectada inicialmente mediante el ACJ.Background Existing air quality monitoring programs are, on occasion, not updated according to local, varying conditions and as such the monitoring programs become non-informative over time, under-detecting new sources of pollutants or duplicating information. Furthermore, inadequate maintenance may cause the monitoring equipment to be utterly deficient in providing information. To deal with these issues, a combination of formal statistical methods is used to optimize resources for monitoring and to characterize the monitoring networks, introducing new criteria for their refinement. Methods Monitoring data were obtained on key pollutants such as carbon monoxide (CO), nitrogen dioxide (NO2), ozone (O3), particulate matter (PM10) and sulfur dioxide (SO2) from 12 air quality monitoring sites in Seville (Spain) during 2012. A total of 49 data sets were fit to mixture models of Gaussian distribution using the expectation-maximization (EM) algorithm. To summarize these 49 models, the mean (μm) and coefficient of variation (cvm) were calculated for each mixture and carried out a hierarchical clustering analysis (HCA) to study the grouping of the sites according to these statistics. To handle the lack of observational data from the sites with unmonitored pollutants, the missing statistical values were imputed by applying the random forests technique and then later, a principal component analysis (PCA) was carried out to better understand the relationship between the level of pollution and the classification of monitoring sites. All of the techniques were applied using free, open-source, statistical software R. Results and conclusions One example of source attribution and contribution is analyzed using mixture models and the potential for mixture models is posed in characterizing pollution trends. The mixture statistics μm and cvm have proven to be a fingerprint for every model and this work presents a novel use of it and represents a promising approach to characterizing mixture models in the air quality management discipline. The imputation technique used is allowed for estimating the missing information from key unmonitored pollutants to gather information about unknown pollution levels and to suggest new possible monitoring configurations for this network. Posterior PCA confirmed the misclassification of one site detected with HCA.Universidad de Granada. Máster Universitario en Estadística Aplicad

    Caracterización de la contaminación atmosférica debida a aportes antropogénicos y naturales mediante la aplicación de modelos de mixturas finitas, de Markov homogéneos y otras técnicas de minería de datos

    Get PDF
    Son cuantiosos los recursos científicos que se dirigen al estudio de las fuentes de emisión de contaminantes atmosféricos en las áreas urbanas. Este estudio puede ser cuantitativo, determinando la contribución de cada fuente a la contaminación ambiente, o cualitativo, para conocer más sobre la composición de las emisiones que afectan a los residentes en las ciudades. En los países mediterráneos, además, la contaminación causada por fenómenos naturales, como el transporte de polvo desde las regiones áridas del Norte de África, también es de primordial importancia. Entre los instrumentos fundamentales de los que se dispone para medir la contaminación atmosférica, se encuentran las redes de vigilancia de la calidad del aire, integradas por estaciones de medida que se sitúan tanto en ambientes urbanos como en el medio rural, con el fin de determinar e informar sobre la calidad del aire que nos afecta. En las ciudades, algunas de estas estaciones de medida se sitúan en emplazamientos fuera del alcance directo de fuentes de emisión, para determinar la contaminación de fondo urbano, representativa de la exposición a la que la población se expone de forma general. Esta tesis ha tenido como objetivos los siguientes: 1. La caracterización exhaustiva de la contaminación atmosférica en entornos urbanos y rurales empleando la información obtenida de las redes de vigilancia de la calidad del aire, desarrollando para ello una metodología general para la gestión eficiente de las redes de monitorización. 2. Mejorar la metodología existente para la estimación del aporte de polvo transportado por las masas de aire cálido desde las regiones norteafricanas. 3. Comparar los niveles de contaminación atmosférica entre diferentes redes de monitorización urbanas, sin influencia industrial y localización geográfica distinta, proponiendo para ello una metodología con la que caracterizar la contaminación atmosférica ambiental y de fondo. Los resultados de esta tesis, apoyados en cada uno de estos objetivos, están avalados, respectivamente, por las siguientes publicaciones: 1. Gómez-Losada, Á., Lozano-García, A., Pino-Mejías, R., Contreras-González, J. 2014. Finite mixture models to characterize and refine air quality monitoring networks. Science of the Total Environment, 485-486: 292-9. 2. Gómez-Losada, Á.,Pires,J.C.M.,Pino-Mejías,R.2015.Time series clustering for estimating particulate matter contributions and its use in quantifying impacts from deserts. Atmospheric Environment, 117: 271-81. 3. Gómez-Losada, Á., Pires, J.C.M., Pino-Mejías, R. 2016. Characterization of background air pollution exposure in urban environments using a metric based on Hidden Markov Models. Atmospheric Environment, 127: 255-61.A wealth of scientific resources have been dedicated to the study of the sources of pollutant emissions to air in urban areas. Such studies may be quantitative, determining the contribution of each source of environmental pollution, or they may be qualitative, providing insight into the makeup of the emissions that afect a city's inhabitants. In Mediterranean countries, contamination may also be the result of natural phenomenon, such as the ow of dust from the arid regions of North Africa, and are therefore of primary importance as well. The ow of particulate matter transcends these geographic areas, passing over the Atlantic Ocean and reaching the American coasts. Among the fundamental tools available for measuring air pollution are the air-quality monitoring networks, made up of monitoring stations located both in urban areas and rural environments, with the aim of providing information on the air quality that afects us. In cities, some of these monitoring stations are located on sites that are outside of the direct range of emission sources and thus the determination of the urban background pollution, which is indicative of the generalised exposure of the population to air pollution, is possible. The objectives of this thesis were the following: To exhaustively characterise the air pollutants in urban and rural areas using the information obtained from the air-quality monitoring networks. To this end, a general methodology was developed to efciently manage the monitoring networks; To improve the existing methodology used to estimate the contribution of dust originating in the North African region that is carried by waves of warm air; To compare the air-pollution levels between the diferent urban-monitoring networks unafected by industrial pollution, and between diferent geographic locations, proposing a methodology that can be used to characterise environmental and background air pollution. In order to fulil the First objective, the primary and secondary air-pollution monitoring data were modelled using finite mixture models. Based on the calculation of the first and second moments of these mixtures, hierarchical cluster analysis, imputation using random forests, and principal component analysis were used. This methodological approximation enabled the detection of duplications within the parameters monitored by the monitoring stations, thus allowing these networks to be reconfigured and enabling the economic resources invested in them to be optimised. For the second objective, hidden Markov models (HMM) were introduced and the diferent regimes or PM10 concentration profiles were described in some of the time series (TS) studied, enabling an estimation of the contribution of each of the profiles to environmental pollution. The new method proposed for estimating the natural contribution of PM10 improves upon the reference methodology used in the European Union (monthly moving 40th percentile method) in three ways - it avoids the use of empirical approximation, it applies modelling that is especially designed for the treatment of time-series data, and it allows for obtaining a con_dence interval for the contribution estimations for PM10. For the third objective, hidden Markov models were also used, in this case to define and characterise the environmental and background pollution caused by primary air pollution in diferent urban areas of diferent cities. The attributable fraction for background air pollution was estimated using a new procedure based on the first concentration profile defined by the HMMs in the TS. The ratio and diference between environmental and background concentrations were also studied

    Automatic Eligibility of Sellers in an Online Marketplace: A Case Study of Amazon Algorithm

    Get PDF
    Purchase processes on Amazon Marketplace begin at the Buy Box, which represents the buy click process through which numerous sellers compete. This study aimed to estimate empirically the relevant seller characteristics that Amazon could consider featuring in the Buy Box. To that end, 22 product categories from Italy’s Amazon web page were studied over a ten-month period, and the sellers were analyzed through their products featured in the Buy Box. Two different experiments were proposed and the results were analyzed using four classification algorithms (a neural network, random forest, support vector machine, and C5.0 decision trees) and a rule-based classification. The first experiment aimed to characterize sellers unspecifically by predicting their change at the Buy Box. The second one aimed to predict which seller would be featured in it. Both experiments revealed that the customer experience and the dynamics of the sellers’ prices were important features of the Buy Box. Additionally, we proposed a set of default features that Amazon could consider when no information about sellers was available. We also proposed the possible existence of a relationship or composition among important features that could be used for sellers to be featured in the Buy Box

    Time series clustering for estimating particulate matter contributions and its use in quantifying impacts from deserts

    Get PDF
    Source apportionment studies use prior exploratory methods that are not purpose-oriented and receptor modelling is based on chemical speciation, requiring costly, time-consuming analyses. Hidden Markov Models (HMMs) are proposed as a routine, exploratory tool to estimate PM10 source contributions. These models were used on annual time series (TS) data from 33 background sites in Spain and Portugal. HMMs enable the creation of groups of PM10 TS observations with similar concentration values, defining the pollutant's regimes of concentration. The results include estimations of source contributions from these regimes, the probability of change among them and their contribution to annual average PM10 concentrations. The annual average Saharan PM10 contribution in the Canary Islands was estimated and compared to other studies. A new procedure for quantifying the wind-blown desert contributions to daily average PM10 concentrations from monitoring sites is proposed. This new procedure seems to correct the net load estimation from deserts achieved with the most frequently used method

    Modelling background air pollution exposure in urban environments: Implications for epidemiological research

    Get PDF
    Background pollution represents the lowest levels of ambient air pollution to which the population is chronically exposed, but few studies have focused on thoroughly characterizing this regime. This study uses clustering statistical techniques as a modelling approach to characterize this pollution regime while deriving reliable information to be used as estimates of exposure in epidemiological studies. The background levels of four key pollutants in five urban areas of Andalusia (Spain) were characterized over an 11-year period (2005e2015) using four widely-known clustering methods. For each pollutant data set, the first (lowest) cluster representative of the background regime was studied using finite mixture models, agglomerative hierarchical clustering, hidden Markov models (hmm) and k-means. Clustering method hmm outperforms the rest of the techniques used, providing important estimates of exposures related to background pollution as its mean, acuteness and time incidence values in the ambient air for all the air pollutants and sites studied

    A novel approach to forecast urban surface-level ozone considering heterogeneous locations and limited information

    Get PDF
    Surface ozone (O3) is considered an hazard to human health, affecting vegetation crops and ecosystems. Accurate time and location O3 forecasting can help to protect citizens to unhealthy exposures when high levels are expected. Usually, forecasting models use numerous O3 precursors as predictors, limiting the reproducibility of these models to the availability of such information from data providers. This study introduces a 24 h-ahead hourly O3 concentrations forecasting methodology based on bagging and ensemble learning, using just two predictors with lagged O3 concentrations. This methodology was applied on ten-year time series (2006–2015) from three major urban areas of Andalusia (Spain). Its forecasting performance was contrasted with an algorithm especially designed to forecast time series exhibiting temporal patterns. The proposed methodology outperforms the contrast algorithm and yields comparable results to others existing in literature. Its use is encouraged due to its forecasting performance and wide applicability, but also as benchmark methodology

    A data science approach for spatiotemporal modelling of low and resident air pollution in Madrid (Spain): Implications for epidemiological studies

    Get PDF
    Model developments to assess different air pollution exposures within cities are still a key challenge in environmental epidemiology. Background air pollution is a long-term resident and low-level concentration pollution difficult to quantify, and to which population is chronically exposed. In this study, hourly time series of four key air pollutants were analysed using Hidden Markov Models to estimate the exposure to background pollution in Madrid, from 2001 to 2017. Using these estimates, its spatial distribution was later analysed after combining the interpolation results of ordinary kriging and inverse distance weighting. The ratio of ambient to background pollution differs according to the pollutant studied but is estimated to be on average about six to one. This methodology is proposed not only to describe the temporal and spatial variability of this complex exposure, but also to be used as input in new modelling approaches of air pollution in urban areas. (c) 2018 The Author
    corecore